System and method for generating light field images

ABSTRACT

A system and method can include receiving a set of views, encoding the set of views, and displaying the set of views such that they are perceived as a holographic image.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.17/226,404, filed 9 Apr. 2021, which claims the benefit of U.S.Provisional Application No. 63/007,605, filed 9 Apr. 2020 and U.S.Provision Application No. 63/120,007, filed 1 Dec. 2020, each of whichis incorporated in its entirety by this reference.

TECHNICAL FIELD

This invention relates generally to the light field imaging field, andmore specifically to a new and useful system and method in the lightfield imaging field.

BACKGROUND

Light field images are often generated using a plurality of images of ascene taken from different perspectives. To generate high quality lightfield images, many images and/or high resolution images can be required.This can lead to very large data structures which can be slow to render,display, process, and/or transmit.

Thus, there is a need in the light field imaging field to create a newand useful system and method. This invention provides such new anduseful system and method.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic representation of the system.

FIG. 2 is a schematic representation of a variant of the method.

FIGS. 3A, 3B and 3C are schematic representations of example displays.

FIG. 4 is a schematic representation of an example of generating a lightfield image, where the light field image is stored as a video that canbe stored as metadata.

FIG. 5 is a schematic representation of an example of generating a lightfield image, where the light field image is stored as a depth array.

FIG. 6 is a schematic representation of an example of decoding anencoded light field image, where the encoded light field image isdecoded by interpolating between views.

FIG. 7 is a schematic representation of an example of storing the lightfield image as a 3D representation of the scene.

FIG. 8 is a schematic representation of an example of decoding anencoded light field image that was encoded as a 3D representation byusing a set of virtual cameras to generate the light field image.

FIG. 9 is a schematic representation of an example of generating a lightfield video, where the light field video is encoded in a ‘zig zag’ videoformat.

FIG. 10 is a schematic representation of an example of generating alight field video, where a subset of the light field frames of the lightfield video are stored as difference light field images computedrelative to the preceding frame.

FIG. 11 is a schematic representation of an example of decoding a lightfield video, where the light field video is decoded by adding adifference light field image (e.g., in a quilt format) corresponding toone frame to the light field quilt image corresponding to the previousframe.

FIG. 12 is a schematic representation of an example of a quilt image.

FIG. 13A is an image of an example of a displayed lightfield image thatis uncompressed.

FIG. 13B is an image of an example of a displayed lightfield image thatis compressed.

FIG. 13C is an image of an example of a displayed multi-lenticularlightfield image.

FIGS. 14A and 14B are schematic representations of examples of a doublelenticular representation of a lightfield image.

FIGS. 15A and 15B are schematic representations of examples of themethod.

FIG. 16 is a schematic representation of an example of view synthesis orview interpolation using machine learning.

FIG. 17 is a schematic representation of an example of encoding alightfield image as a polynomial.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The following description of the preferred embodiments of the inventionis not intended to limit the invention to these preferred embodiments,but rather to enable any person skilled in the art to make and use thisinvention.

1. Overview.

As shown in FIG. 2, a method can include acquiring views and generatingencoded light field image(s). The method can optionally includeprocessing the views, displaying the light field image(s), and/or anysuitable steps.

As shown in FIG. 1, a system can include a computing system. The systemcan optionally include a camera array, a display, and/or any suitablecomponents. The computing system can include a processing module,encoding module, decoding module, communication module, a storagemodule, and/or any suitable modules.

The system and method can function to acquire a plurality of views of ascene (e.g., from different perspectives), generate a light field imageand/or light field video from the plurality of views, encode the lightfield image and/or light field video, and/or decode encoded light fieldimages and/or light field videos. The system and method can enablecompact light filed image transmission, while preserving high-resolutiondisplay quality.

2. Benefits.

Variations of the technology can confer several benefits and/oradvantages.

First, variants of the technology can decrease the size (e.g., amount ofmemory required to store the light field images and/or the bandwidthrequired to transmit the light field images, etc.) of the light fieldimage (and/or light field video) by up to 90% (e.g., 10%, 20%, 30%, 40%,50%, 75%, 80%, 90%, etc. compression ratio and/or data rate savings).The decreased size can facilitate and/or enable the light field imagesand/or light field video use, processing, transmission, display,storage, and/or any suitable manipulation in real-time or near-realtime. In specific examples, encoding the light field images and/or lightfield videos (e.g., in a compressed format) can decrease the size of thelight field images and/or light field video.

Second, variants of the technology can enable faster light field image(and/or light field video) generation and/or rendering (e.g., ascompared to ray based encoding). In specific examples, raster-basedencoding methods can require less processing power and can occupy lessspace than ray-based light field encoding methods.

Third, variants of the technology can enable higher quality lightfieldimages to be generated on weaker computing systems and/or computingsystems that are not able to support a wide variety of image formats.This effect can be seen, for example, by comparing FIGS. 13A, 13B, and13C. FIG. 13A shows an example of a lightfield image where the imageformat is uncompressed or compressed with low losses. FIG. 13B shows anexample of the same lightfield image as FIG. 13A, but where the imageformat is compressed using a lossy compression (e.g., yuv420 pixelformat). FIG. 13C shows an example of the same lightfield image as FIG.13B using the same compression, but where the light field image has beenmultilenticularized. Comparing FIGS. 13B and 13C, the edges of objectsin FIG. 13C are sharper and the colors are closer to the true color (asrepresented in FIG. 13A) than in FIG. 13B.

However, variants of the technology can confer any other suitablebenefits and/or advantages.

3. System.

The system can function to acquire a plurality of views of a scene(e.g., from different perspectives), generate a light field image fromthe plurality of views, encode the light field image(s), and decode theencoded light field image(s). The light field image 400 can be a stillimage (e.g., an array of still images), a frame of a video (e.g., alightfield video, a timeseries of arrays of images, an array of videos),a computer generated image, and/or any suitable image.

A light field image can include a set of views. The light field image(s)can include any suitable number of views between 1-250, such as 2, 4, 8,10, 20, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130, 135, 150, 200,250, and/or value therebetween. However, the light field image(s) caninclude greater than 250 images and/or any suitable number of views.

Each view is preferably collected from a different camera position(e.g., shows the scene from different perspectives, shows the scene fromoverlapping perspectives, etc.), but two or more views can be collectedfrom the same location. The light field image(s) can be used to displaya 3D representation of the scene (e.g., a holographic image), display a2D representation of the scene, and/or be used for any purpose. Theviews are preferably contemporaneously or concurrently sampled (e.g.,within a threshold time of each other), but can be sampled at any othersuitable time.

Each view is preferably an image including an array of pixels, but canalternatively be a video or other data structure. The image ispreferably associated with one or more color channels (e.g., a red,green, and blue channel), but can additionally or alternatively beassociated with a depth channel and/or other channel. Each pixel of theimage is preferably associated with a value for each channel of theimage, but can additionally or alternatively have no value for eachchannel. The value is preferably generated by the camera sampling theview, but can additionally or alternatively be determined by anauxiliary sensor pixel-aligned with the camera, by calculating the value(e.g., inferring the depth per pixel using photogrammatic techniques,stereovision techniques, etc.), and/or otherwise determined.

Each view of the plurality of views can be indexed, named, tagged,associated with a source camera identifier (e.g., identifier for thecamera sampling the respective view), and/or otherwise uniquelyidentified (e.g. within the light field image, globally, etc.). However,the views do not have to be uniquely identified. The views arepreferably indexed consecutively from 1 to N (wherein N is the totalnumber of views), but can be indexed from 0 to N−1, and/or indexed inany suitable manner. Each view can be indexed according to acorresponding camera position (e.g., of a camera position correspondingto the view within the camera array), corresponding camera number,corresponding camera (e.g., of the camera corresponding to the view),based on an orientation of the views (e.g., left-most view correspondsto number 1 while the right-most view corresponds to number N,right-most view corresponds to number 1 while the left-most viewcorresponds to number N, top-most view is number 1 while the bottom mostview corresponds to number N, bottom-most view corresponds to number 1while the top-most view corresponds to number N, etc.), randomly,pseudo-randomly, and/or any suitable index. The view indexing ispreferably the same for all light field images, but the view numberingcan vary between light field images. The views can be real views (e.g.,images acquired by an optical sensor of a camera) and/or virtual views(e.g., generated by a virtual camera, renders, models, simulations,etc.).

The light field images can be quilt images, photosets, and/or have anysuitable format.

A quilt image 450, as shown for example in FIG. 12, is preferably an a×barray of views, where a and b can be any number between 1 and N (e.g.,the total number of views). The product of a and b is preferably N, butcan be greater than or less than N (e.g., when one or more views isdiscarded). Within the quilt image, the set of views can be arranged (orindexed) in a raster (e.g., starting at the top left and rasteringhorizontally through views to the bottom right, starting at the top leftand rastering vertically through views to the bottom right, starting atthe bottom left of the quilt image and rastering horizontally throughviews to the top right, starting at the bottom left of the quilt imageand rastering vertically through views to the top right, etc.), in aboustrophedon, randomly, and/or in any suitable order. The viewarrangement within the quilt image preferably mirrors the arrangement(e.g., position) of the source camera within the camera array thatsampled the respective views, but can be otherwise determined. Thestarting view can be associated with the first camera (e.g., whereineach camera is assigned a camera number), the left most camera of thecamera array, the right most camera of the camera array, the centercamera of the camera array, the top most camera of the camera array, thebottom most camera of the camera array, a random camera, and/or anysuitable camera of the camera array. However, the quilt image can bearranged in any suitable manner.

A light field video can include a series of light field images (e.g.,frames 400), each associated with a different time point, or include anyother suitable format.

The encoded light field images and encoded light field videos arepreferably compressed relative to the light field images and light fieldvideos respectively (e.g., require less memory to store, require lessbandwidth to transmit, etc.), but can be related to the light fieldimages in any suitable manner. The encoded light field images can be:video (e.g., wherein each frame of the video corresponds to a view),depth arrays (e.g., depth quilts), 3D reconstructions, difference views(e.g., representing a difference between a first and second view withinthe light field image), polynomials (e.g., Lagrange polynomials),wavelets, and/or any suitable formats. The encoded light field video ispreferably a video (e.g., wherein each frame is stored as metadatawithin a representative view of the frame; wherein every set of N framescorresponds to a light field image; etc.), but can additionally oralternatively include: a series of 3D reconstructions (and/or poseparameters thereof); a series of a depth arrays each representative of aframe; a series of difference light field images (e.g., differencebetween a first and second frame of the light field video; set ofdifference views between the same indexed view across successiveframes); and/or any suitable format. The encoded light field video canbe formed using zigzag compression (e.g., ping pong compression),difference compression, and/or any suitable compression.

The light field image(s) can optionally be associated with metadata. Themetadata can be associated with (e.g., one or more of the following canstore or otherwise be tagged with the metadata) a data structure suchas: a view (e.g., a key view, a thumbnail, an image, a subportion of thelight field image, etc.), a key light field frame, the encoded lightfield image, the encoded light field video, and/or any other suitabledata structure. Examples of metadata can include: type of light fieldimage (e.g., original light field image representation such as ‘quiltimage,’ ‘photoset,’‘depth quilt,’ etc.; encoded light field imagerepresentation such as ‘video,’ ‘raster,’ ‘ray,’ ‘zigzag,’ ‘depth array’or ‘depth quilt,’ ‘scene graph,’ ‘3D reconstruction,’ ‘polynomial,’‘wavelet,’ format as described below, etc.), camera data (e.g., cameracalibrations such as intrinsic parameters and/or extrinsic parametersfor one or more cameras of the camera array; camera poses; cameradistance from features of the scene; etc.), view data (e.g., tiling,crop region, crop orientation, view processing, view arrangement, vieworder, total number of views, etc.), codec data (e.g., compressionalgorithm used), technical metadata (e.g., camera pose, camera operationparameters, etc.), descriptive metadata (e.g., titles, captions,information provided by the user, etc.), administrative metadata (e.g.,licensing information, owner information, etc.), structural metadata(e.g., view index, frame order, etc.) and/or any suitable metadata. Themetadata can be appended to the data structure, embedded within the datastructure, or otherwise associated with the data structure.

The optional camera array 100 functions to acquire views associated witha light field image. The camera array preferably includes a plurality ofcameras, but can include a single camera. Each camera can be fixed(e.g., be mounted to have a static relation orientation, static absoluteorientation, etc.) or moveable. The number of cameras in the cameraarray is preferably the same as the number of views in the light fieldimage 400 and/or light field frame 400. However, the number of camerasin the camera array can be less than the number of views (e.g., when oneor more cameras are mounted on a gantry, track, robot, motor, and/orother movement system and acquire images from more than one perspective)or greater than the number of views (e.g., to provide redundancy; toprovide options for different perspectives such as above, below, wideview, narrow view, etc.; etc.). Each camera is preferably synchronized(e.g., acquires an image and/or frame within 100 ms of the othercameras), but the cameras can be unsynchronized. The image size (e.g.,view size) is preferably the same for each camera (e.g., same sizeoptical sensor for each camera, same pixel pitch, same pixelarrangement, etc.), but can be different (e.g., different optical sensorfor each camera, different pixel pitch, different pixel arrangement,etc.). The camera array is preferably calibrated (e.g., camera pose foreach camera known, intrinsic parameters for each camera known, extrinsicparameters for each camera known, etc.), but can be uncalibrated. Thesystem can optionally include: one or more depth sensors aligned witheach camera and/or the camera array (e.g., time of flight sensors,LIDAR, projected light sensors, etc.) and/or other sensors.

The optional display(s) 300 functions to display light field imagesand/or light field videos. The display can optionally display anysuitable image and/or view. The display is preferably configured todisplay light field images that are formatted as a quilt image (e.g., asshown in FIG. 12). However, the display can display light field imagesthat are formatted as a photoset, and/or in any suitable format.

The display is preferably configured or arranged to display the lightfield image as a holographic image (e.g., an image that appears threedimensional, such as the display can include cues or other visualinformation that leads to a perception of a three dimensional image),but can be configured to display the lightfield image as a 2D image,and/or otherwise be configured. Viewer(s) preferably perceive the lightfield image as three dimensional without using a peripheral (e.g.,special glasses, headset, etc.). However, the display can use aperipheral (e.g., to enable the perception of the light field image asthree dimensional, to enhance the perception of depth, to augment thelight field image, etc.). The display is preferably configured todisplay the light field image to a plurality of viewers (e.g., whereeach viewer can see a different perspective of the three dimensionalimage, where each viewer can see the same perspective of the threedimensional image, etc.), but can display the light field image to asingle viewer.

The display 300 can include one or more: light sources 320, opticalelements 340 (e.g., lenses; polarizers; waveplates; filters such asneutral density filters, color filters, etc.; beam steerers; liquidcrystals; mirrors; etc.), parallax generators 360 (e.g., lenticulararrays, fly lens, etc.), optical volumes 380, volumetric guides 385,and/or any suitable components. In specific examples, the display can beas shown in FIG. 3A, 3B, or 3C; any suitable display as disclosed inU.S. Pat. No. 10,191,295 entitled ‘ADVANCED RETROREFLECTING AERIALDISPLAYS’, filed on 5 Jan. 2018 or U.S. Pat. No. 10,298,921 entitled‘SUPERSTEREOSCOPIC DISPLAY WITH ENHANCED OFF-ANGLE SEPARATION,’ filed on24 Jul. 2018, each of which is incorporated in its entirety by thisreference; and/or any suitable display.

In variants including a plurality of displays, each display can be thesame or different from the other displays.

The computing system 200 functions to process views, generate lightfield image(s) 400, generate light field video(s) 405, control thecamera array and/or display, encode the light field image(s) and/orlight field video(s), and/or decode the light field image(s) and/orlight field video(s). The computing system can be local (e.g., to thecamera array, to a camera of the camera array, to each camera of thecamera array, to one or more displays, etc.), remote (e.g., cloudcomputing, server, network, etc.), and/or distributed (e.g., between alocal and a remote computing system). The computing system can be incommunication with the camera array, a subset of cameras of the cameraarray, the display(s), and/or with any suitable components. In anillustrative example, a computing system can include or be a singleboard computer such as a Raspberry Pi™ or Data General Nova. However,any suitable computing system can be used. The computing system (and/ora graphics processing unit (GPU) thereof) can support RGB (e.g., RGB24,RGB32, RGB555, RGB565, RGB888, etc.), YUV (e.g., YUV411, YUV420, YUV422,YUV444, etc.), CMYK, YIQ, YCbCr, YPbPr, xvYCC, HSV, HSL, CIE, and/or anysuitable color encoding format. The computing system is preferably ableto process at least 4 k videos (e.g., resolutions up to 3840×2160,4096×2160, 7680×4320, other aspect ratios with a comparable total pixelcount, etc. at frame rates of 20 Hz, 30 Hz, 60 Hz, etc.) but can be ableto process HD videos (e.g., resolutions up to 1080p, 720p, etc.), lowresolution videos, still images, and/or any image or video size. Thecomputing system can include a processing module, a storage module, anencoding module, a decoding module, a communication module, and/or anysuitable modules.

The processing module functions to process views, light field images,and/or light field videos. The processing module can applytransformations (e.g., translation, scaling, homothety, similaritytransformation, reflection, rotation, shear mapping, affinetransformations, projective transformations, similarity transformations,Euclidean transformations, etc.), crop views, compress views, alignviews (e.g., align one or more feature between views), rectify views(e.g., modify views to be on the same epipolar line), correct views(e.g., modify brightness, modify contrast, modify color, remove one ormore pixels, etc.), and/or process the views in any suitable manner. Insome variants, the processing module can additionally or alternativelylenticularize (e.g., by applying a display calibration to the lightfieldimage where the lenticularized image is displayed by the display) and/ormultilenticularize the light field image. In a first illustrativeexample, a lightfield image can be lenticularized (ormultilenticularized) at a processing module that is local to a display.In a second illustrative example, a lightfield image can belenticularized using a processing module that is integrated in a remotecomputing server (e.g., a cloud computing system that has access to acalibration for a display). However, light field image can otherwise belenticularized. The processing module can include one or more: GPUs,CPUs, TPUs, microprocessors, and/or any other suitable processor.

The communication module functions to receive and transmit data (e.g.,images, instructions, etc.) and/or metadata. The communication modulecan enable long-range and/or short range communication. In specificexamples, the communication module can include cellular radios (e.g.,broadband cellular network radios) such as radios operable tocommunicate using 3G, 4G, and/or 5G technology, Wi-Fi radios, Bluetooth(e.g., BLE) radios, Zigbee radios, Z-wave radios, Thread radios, wiredcommunication modules (e.g., wired interfaces such as coaxial cables,USB interfaces, fiber optic, waveguides, etc.), and/or any othersuitable communication subsystems. The communication module can beincluded in the camera array, the central computing system, and/or anysuitable computing system.

The storage module (e.g., memory) functions to store views, lightfieldimages, encoded light field images, encoded light field videos, and/ordata (e.g., calibration data, camera pose, etc.). The storage module canstore: acquired view(s), processed view(s), light field image(s),video(s), light field video(s), camera position(s), and/or any suitabledata. The storage module can include volatile or nonvolatile memory.

The encoding module functions to generate encoded light field imagesand/or encoded light field movies from the views. The encoded lightfield images (and/or light field videos) are preferably compressed butcan be uncompressed. The encoding module can arrange the views (e.g.,organize the views into a quilt image, organize the views by viewnumber, organize the views by perspective, etc.), determine depth maps(e.g., for one or more views), generate 3D reconstruction of the scene,calculate differences between views (and/or light field frames),determine key views (and/or key light field frames), store metadata withthe key views (and/or key light field frames), compress the views,and/or perform any suitable steps.

The encoding module can include a codec, which functions to encode thelight field image and/or light field video. The codec can include: lossyalgorithms (e.g., transform coding such as discrete cosine transform,wavelet transform, etc.; reducing the color space; chroma subsampling;fractal compression; MPEG-4; yuv420; etc.), visually losslessalgorithms, and/or lossless algorithms (e.g., run-length encoding, areaimage compression, predictive coding, entropy coding, dictionaryencoding such as LZ, LZW, etc.; DEFLATE; chain codes; H.264; H.265;motion jpeg 2000; etc.).

The decoding module functions to convert the encoded light field imageand/or encoded light field videos to a decoded format. The decodedformat is preferably a light field image (and/or light field video), butcan be any suitable format. The decoded format can depend on thedisplay(s). The decoding module can arrange the views (e.g., organizethe views into a quilt image, organize the views by view number,organize the views by perspective, etc.), interpolate between views(e.g., between depth map representations of views), views from 3Dreconstruction(s) (e.g., using virtual camera(s)), add two or more views(and/or light field frames), determine key views (and/or key light fieldframes), determine metadata associated with key views (and/or keyframes), decompress the views, and/or perform any suitable steps.

The decoding module can be the same as and/or different from theencoding module. In a specific example, the decoding module can performthe same operations as the encoding module in reverse order. In a secondspecific example, the decoding module can perform the inverse operationof the encoding module (e.g., addition instead of subtraction,decrypting instead of encrypting, etc.) in the same or reverse order ofoperations that the encoding module performed the operations. However,the decoding module can work in any suitable manner.

4. Method.

The method can include receiving views S100 and generating an encodedlight field image S300. The method can optionally include processing theviews S200, displaying the light field image S400, and/or any suitablesteps. The method functions to generate a light field image (and/orlight field video). The method can be performed once or multiple times(e.g., in parallel such as to generate more than one light field imageat the same time, in series such as to generate each light field imagesequentially, etc.). The method is preferably performed with the systemdisclosed above, but can additionally or alternatively be performed withany other suitable system.

Receiving the views S100 functions to access (e.g., acquire, retrieve,etc.) visual data (e.g., views 410) of a scene. S100 can be performed bya camera array, a computing system (e.g., a memory module, a server,database, a rendering module, etc.), and/or any suitable component. S100can include taking images (e.g., with each camera of the camera array),translating one or more cameras to acquire images from differentperspectives, retrieving images from memory, retrieving images from adatabase, generating a model of a scene, acquiring (e.g., projecting)views of a generated model, and/or any suitable steps. Each imagepreferably corresponds to a view of the light field image. However, anysuitable images can be used for the views in the light field image. Invariants, such as when the images are acquired by a camera and/or cameraarray, S100 can include transmitting the views to a computing system(e.g., to a cloud computing system, to a computing system collocatedwith a display, to a display computing system, to a camera computingsystem, etc.).

Processing the views S200 functions to process one or more views. S200is preferably performed after S100, but can be performed at the sametime as and/or before (e.g., when processed views are stored in thestorage module) S100. S200 is preferably performed by a computing system(e.g., a processing module), but can be performed by any suitablecomponent. The light field images can be processed at a computing systemcollocated with the camera array (e.g., using a camera or camera arraycomputer, etc.), at a remote computing system (e.g., at a cloudcomputing system), at a computing system collocated with a display(e.g., a display computing system), and/or otherwise be distributed.Processing one or more views can include: cropping views (e.g.,according to a crop box), aligning views (e.g., positioning a feature ofthe views to the same position within the crop box), rectifying views(e.g., ensure that epipolar lines for all cameras are parallel),transforming views (e.g., applying affine transformation, a projectivetransformation, Euclidean transformation, etc.), correcting views (e.g.,balancing brightness, modifying brightness, balancing contrast,modifying contrast, modifying color, etc.), refocusing views, and/or anysuitable processing step. Each view can be processed in the same and/ordifferent manner. However, one or more views can be unprocessed.

S200 can optionally include generating the light field image, whichfunctions to convert the views (e.g., acquired in S100, processed inS200, etc.) into a light field image. Generating the light field imagepreferably includes generating a quilt image from the views, but caninclude generating a photoset from the views (e.g., a set of the views),and/or any suitable steps. The quilt image can be a dense array (e.g.,with a view for every element), a semi-dense array (e.g., with a viewfor most elements, and N/A or a placeholder for the remaining elements),and/or a sparse array (e.g., with a view for less than a majority ofelements).

Generating the encoded light field image(s) S300 can function to combinethe views (e.g., processed views, unprocessed views, etc.) into anencoded light field image 420 (and/or encoded light field video 425).The views 410 are preferably those from a light field image 400, but canadditionally or alternatively be views from a light field video (e.g.,including multiple frames, wherein each frame can be a light fieldimage), a lenticularized image (e.g., as described in S400), and/orother views or images. The light field images can be encoded at thecamera array (e.g., using a computing system collocated with the cameraarray, using a camera or camera array computer, etc.), at a remotecomputing system (e.g., at a cloud computing system), at a computingsystem collocated with a display (e.g., a display computing system),and/or otherwise be distributed.

S300 is preferably performed after S200, but can be performed at thesame time as and/or before S200. S300 is preferably performed afterS100, but can be performed at the same time and/or before (e.g., when alight field image is stored in the storage module) S100. In variants,generating the light field image can be performed for: each light fieldimage (e.g., still light field image, each frame of the light fieldvideo), for a light field video (e.g., the set of light field imagesthat collectively define the light field video, a subset of the lightfield images of the light field video, etc.), and/or for any suitableimages. The encoded light field image(s) can be generated from a lightfield video, a light field image, from the views, from a lenticularizedimage (e.g., as described in S400), and/or from any suitable images.S300 is preferably performed by a computing system (e.g., an encodingmodule), but can be performed by any suitable component.

The encoded light field image can include all N views, and/or a subsetof views (e.g., less than N views). The number of views in the subsetcan depend on the encoding and/or decoding method, a target quality(e.g., amount of lose, image appearance, etc.) for the displayed lightfield image, depend on a transmission bandwidth, depend on a targetimage size, and/or otherwise be determined. For example, the encodedlight field image can include 1, 2, 3, 4, 6, 8, 10, 15, 20, 25, 30, 40,values therebetween, >40, a multiple of the number of views (such asN/2, N/4, N/8, N/10, N/20, etc.), and/or any suitable number of views.

S300 can include determining key view(s), which functions to identifyviews within the subset of the views that include metadata and/or todetermine a subset of views to append metadata to. The key view(s) canfunction as references within the encoded light field image(s) and canhelp prevent errors from accumulating (e.g., during decoding the encodedlight field image). The key view(s) can be selected automatically and/ormanually. One or more key views can be selected for a light field image.The number of selected key views (and/or which key views are selected)can be determined based on the encoding scheme, based on a view quality,and/or otherwise determined. The key view is preferably the first view(e.g., of the arranged views, view “1,” view “N,” etc.), but can be aview associated with a central perspective (e.g., approximately viewN/2), a view associated with an occluded surface (e.g., view depicting acavity that is occluded in other views), diametrically opposing views,and/or any suitable view. However, the key view can be an image of thescene, an icon, a thumbnail, a drawing, a render, and/or any suitablevisual. In variants, a key view can be included with any suitablefrequency between every view and every 100*N views (such as every N/10,N/5, N/4, N/2, N, 2*N, 4*N, 10*N, 20*N, 30*N, 50*N, etc. views) such asbetween a predetermined number of lightfield images and/or with anysuitable frequency. However, key views can be included once (e.g., thefirst view), randomly, based on a ruleset (e.g., a key viewcorresponding to each light field image, a key view based on adifference in visual data in views, a key view based on changes in vieworder, etc.), and/or at any suitable frequency. In a specific example,the key view can be a thumbnail generated by shrinking a view from thelight field image to a smaller representation of the view (e.g., if theoriginal view has a resolution of 1080p, the thumbnail may have aresolution of 30p, 60p, 96p, 120p, 144p, 240p, etc.). However, anysuitable key frame can be used.

The metadata can function to provide instructions regarding how thelight field image and/or light field video is encoded (e.g., to enableor facilitate decoding the encoded data), provide information regardingthe camera array (e.g., camera calibration), and/or can perform anysuitable function. In specific examples, the metadata can include: viewnumber, key frame (e.g., indicating a key frame, such as a specificlight field image, within a light field video), view order (e.g., howare the views arranged such as ascending order, descending order, etc.),a compressed video (e.g., a video that includes each view associatedwith a light field image), scan (e.g., type of scan), metadata asdescribed above, and/or any suitable information.

S300 can include compressing data (e.g., views, light field images,videos, lightfield videos, depth maps, depth arrays, difference lightfield images, metadata, etc.), which functions to decrease the size ofthe data and/or generate the encoded light field image. The data ispreferably compressed in a video format (e.g., using a video codec), butcan be compressed in an image format and/or in any suitable format. Thedata is preferably compressed using a codec, but can be compressed orencoded using machine learning (e.g., a neural network such as aconvolutional neural network, deep neural network, nonlocal neuralnetwork, recursive neural network, etc.; a genetic algorithm; Bayesianoptimization; geometry networks; context networks; Adaptive SeparableConvolution; Deep Voxel Flow; etc.) and/or using any suitable algorithmand/or computer code. The codec can include: lossy algorithms (e.g.,transform coding such as discrete cosine transform, wavelet transform,etc.; reducing the color space; chroma subsampling; fractal compression;MPEG-4; etc.), visually lossless algorithms, and/or lossless algorithms(e.g., run-length encoding, area image compression, predictive coding,entropy coding, dictionary encoding such as LZ, LZW, etc.; DEFLATE;chain codes; H.264; H.265; motion jpeg 2000; etc.). However, the datacan be compressed in any suitable manner. During encoding (anddecoding), the data can be scanned in any manner. For example rasterscanning, continued raster scanning, diagonal scanning, diagonalscanning with parallel returns, right orthogonal scanning, spiral inscanning, spiral out scanning, continued orthogonal scanning, verticalsymmetric by rows scanning, vertical symmetric by columns scanning, maindiagonal symmetric scanning, diagonal symmetric by secondary linesscanning, z-scanning, block scanning, x-scanning, and/or any suitablescanning can be used.

In a first embodiment, S300 can include arranging the views, whichfunctions to create a linear array of views (e.g., an M×1 array, where Mis a natural number; a timeseries of sequential views; etc.). However,the views can be arranged in a two dimensional array and/or in anymanner. The views can be arranged in ascending order (e.g., according tothe respective view index, pose, etc.), descending order, randomly,according to a predetermined pattern, and/or in any suitable manner. Thearranged views can start at the first view, the last view, a centralview (e.g., view number approximately N/2), a random view, and/or anysuitable view. The views are preferably arranged in the same order foreach light field image and/or light field video (or the views associatedtherewith), but the views can be arranged in a different order for eachlight field image and/or light field video.

In a first example, for a (still) light view image with N views, thelinear array can be an N×1 array of the views arranged in ascendingorder starting at 1.

In a second example, for a light field video with k sequential lightfield images, wherein each light field image includes N views, thelinear array can be an (N*k i.e., N multiplied by k)×1 array. In thesecond example, the views from consecutive light field images of thelight field video can have alternating arrangements (e.g., 1 . . . N, N. . . 1, etc.; ordered in a sequentially inverting raster order alsoreferred to as a “zigzag” or “ping-pong” order), or be arranged in thesame order (e.g., 1 . . . N, 1 . . . N, etc.). In a specific example, asshown in FIG. 9, views associated with a light field image of the lightfield video can be arranged in ascending order starting with view 1. Inthis specific example, views associated with the subsequent light fieldimage of the light field video can be arranged in descending orderstarting with view N. This alternating arrangement of views betweenlight field images can be referred to as a “zigzag arrangement” or “pingpong arrangement,” where light field videos that are compressed withthese arrangements can be referred to as “zigzag compressed” or “pingpong compressed”. However, the views associated with a light field videocan be arranged in any suitable order.

In a second embodiment, S300 can include determining a depth map, whichfunctions to determine a depth to each pixel (and/or feature) within agiven view. The depth map preferably encodes the depth to each pixel inan image, but can encode the depth to a subset of pixels of the image(e.g., the subset of pixels corresponding to one or more features)and/or any suitable information. The depth map is preferably determinedfor a subset of the views of the light field image, but can be for allviews, and/or any suitable views. The views within the subset of viewsare preferably evenly spaced (e.g., alternating views, every 3^(rd),4^(th), 6^(th), 8^(th), 12^(th), 20^(th), etc. view), but can beirregularly spaced, selected according to a ruleset (e.g., one view fromeach row of a quilt image, one view from each column of a quilt image,selected based on the inter-view change, etc.), or selected in anysuitable manner. The depth map is preferably determined between adjacentviews (e.g., a depth map associated with the first view can bedetermined between the first and second view; a depth map associatedwith the second view can be determined between the first and second orsecond and third view, etc.), wherein the adjacent views can bespatially adjacent or temporally adjacent, but can be determined betweenany pair of views. The depth map is preferably determined based on theset of disparity vectors between the views, but can additionally oralternatively be determined using a depth sensor (e.g., be measured oracquired when the view is acquired) and/or otherwise be determined. Thedisparity vector can be a vector that indicates a pixel separationbetween a pixel in one view and a corresponding pixel in another view.The depth map can be calculated using the disparity vector and thecamera calibration (e.g., the camera intrinsic parameters such as focallength, principle point, etc.; camera extrinsic parameters such asseparation between cameras, camera pose, etc.; etc.). However, the depthmap can be determined with a depth sensor (e.g., paired with the camera,with the camera array, etc.), or in any suitable manner.

In an illustrative example, the depth map for view 1 is determined bydetermining the disparity vector between view 1 and view 2 (e.g., byidentifying corresponding pixels and/or features between view 1 and view2), and calculating the depth map using the camera calibration and thedisparity vector.

In a second illustrative example, the depth map for view 10 can bedetermined by interpolating the depth map determined between view 9 andview 10 and the depth map determined between view 10 and view 11.

In a third illustrative example, the depth map for a view can bedetermined using a machine learning algorithm (e.g., a neural network).However, the depth map can be interpolated in any suitable manner.

Determining the depth map can include determining a depth array 421(also referred to as a “depth quilt image”). The depth array 421 caninclude one or more of: the set of views, the depth map associated witheach of the set (and/or subset) of views, the subset of views (and theassociated depth maps), the color data (e.g., RGB values for each pixelor view) for the subset of views, the relationship between the subset ofviews (e.g., correspondences between pixels of views within the subsetof views), and/or any suitable information. The depth array can bearranged in a linear array (e.g., vertical array, horizontal array,etc.), a 2D array (e.g., such as an M×2 array wherein M corresponds tothe number of views in the subset of views and the depth array includesthe subset of views and the depth map corresponding to the subset ofviews), and/or any manner. In an illustrative example, a depth array canbe a quilt image where each view in the quilt image includes depthinformation as well (e.g., in the alpha channel of the image data).However, a depth array can otherwise be defined.

In a variant of the second embodiment, S300 can include generating a 3Dreconstruction, which functions to generate a geometric reconstructionof the scene. The 3D reconstruction 423 can be represented using voxels,rays, polygons, contours, points, depths, meshes, convex hulls, and/orbe represented in any way. The 3D reconstruction can option include atexture (e.g., color texture). The 3D reconstruction is preferablygenerated from the depth maps (e.g., the depth maps corresponding to thesubset of views, depth maps corresponding to the set of views, depthmaps corresponding to another set of views, etc.) and the camera arraycalibration (e.g., the intrinsic and/or extrinsic parameters associatedwith the camera array), but can alternatively be determined from ageometric scan of the scene or otherwise determined. The 3Drepresentation can be complete (e.g., include data for all sides of therepresentation), or incomplete (e.g., include data for a portion of therepresentation, include only data from a given point of view, etc.). Ina specific example, generating the 3D reconstruction can include:determining the depth map (e.g., for a subset of views), generating ageometric 3D reconstruction from the depth maps (e.g., based on thephysical relationships between the views within the view subset), andprojecting the views from the view subset onto the geometric 3Dreconstruction to generate a visual 3D reconstruction of the scene. Inthis specific example, the geometric 3D reconstruction can be a convexhull, a mesh, and/or any suitable structure. However, the 3Dreconstruction can be generated in any suitable manner.

A 3D reconstruction is preferably determined for a light field image,but can additionally or alternatively be determined for a light fieldvideo and/or any other data structure. In a first example, a light fieldvideo can be represented by a single 3D reconstruction with differentposes (e.g., representative of the scene pose relative to a point ofview). In a second example, a light field video can be represented by aseries of 3D reconstructions, each representative of a respective lightfield image (frame) of the light field video. However, light fieldvideos can be otherwise represented.

In a third embodiment, S300 can include determining difference lightfield images, which can function to determine differences between viewsin one light field image and views in another light field image (and/orbetween views within a given light field image). Each difference lightfield image 427 can include a set of difference views, wherein adifference view can be the difference between two views (e.g., viewswith the same index in two different frames; two different views in thesame light field image, etc.) In variants of this embodiment, themajority of light field images (e.g., greater than about 50%, 60%, 75%,90%, 95%, 97.5%, 99%, 100%, etc. of light field images) of a light fieldvideo can be encoded as difference light field images. For example,twenty nine out of every thirty frames of a lightfield video can berepresented as difference light field images. However, a minority of thelight field images (e.g., less than about 1%, 5%, 10%, 25%, 30%, 40%,50%) of a light field video can be encoded as difference light fieldimages. The difference light field images can be computed from arrangedviews, light field images, as acquired views, as processed views, and/orany suitable views and/or images can be used. A difference light fieldimage is preferably computed for consecutive light field images (e.g.,consecutive frames of a light field video, frame prior to the currentframe, frame after the current frame, etc.), but can be computed betweena light field image and a key light field image (e.g., a reference lightfield image), a light field image and a nonconsecutive image, and/orbetween any suitable light field images. In an illustrative example, thedifference light field image associated with light field image t (wheret is an index) is the difference between light field image t and lightfield image t−1. The difference is preferably computed between analogousviews of the light field images (e.g., same view index, views associatedwith the same camera, views associated with the same camera position,etc.), but can be computed for nonanalogous views. In some variants ofthe third embodiment, difference light field images can includedifference views computed between views of the same light field image,such as views associated with adjacent cameras, with adjacent camerapositions, and/or between any suitable views.

In a first variant, determining the difference light field images caninclude determining difference views (e.g., between as acquired views,between arranged views, etc.) and generating a difference light fieldimage from the differenced views. The difference views are preferablydetermined between views from different light field images that werecaptured by the same camera, but can alternatively be determined betweenviews of the same light field image. In a second variant, determiningthe difference light field image can include determining differenceviews and arranging the views (e.g., as described above). In a thirdvariant, determining difference light field images can includecalculating a difference between two light field images and arrangingthe views from the difference light field image. However, determiningdifference light field images can include any suitable steps.

In a fourth embodiment (as shown for example in FIG. 17), S300 caninclude determining a representation associated with the light fieldimage. The representation is preferably a polynomial representation, butcan include any suitable decomposition (e.g., a wavelet transformation,a frequency decomposition, into any suitable orthogonal or nonorthogonalfunctions, etc.). The coefficients of the polynomial preferably encode(e.g., stores, represents, etc.) the image (e.g., color information,color for a given color channel, depth, etc.); however, additionally oralternatively the polynomial order and/or any suitable characteristiccan encode the image information. For example, the polynomial caninclude a different variable for each channel for each pixel of theconstituent views, wherein the coefficient can represent the value ofthe given channel. In another example, each variable can represent adifferent view. However, the polynomial can be otherwise constructed.The polynomial representation can be determined, for example, usingLagrange decomposition, using spline interpolation, using polynomialfitting, and/or otherwise be determined. The polynomial representationcan be determined image-wise (e.g., to a light field image as a whole),view-wise (e.g., a view of a light field image is associated with apolynomial), cluster-wise (e.g., a cluster of pixels within a view isassociated with a polynomial), pixelwise (e.g., individual pixels areassociated with a polynomial), and/or otherwise be applied.

In variants of S3 oo, two or more of the first, second, third, and/orfourth embodiments can be combined and/or applied to the same lightfieldimage.

The method can optionally include transmitting the light field image(s)(e.g., encoded lightfield image(s), lenticularized image, decoded lightfield image, etc.), which functions to transmit the light field image(s)to an endpoint. The endpoint can include one or more: display, memorymodule (e.g., of a computing system), processing module, decodingmodule, computing system, database, server, user, viewer, and/or anysuitable component. Transmitting the light field image(s) can beperformed by a communication module (e.g., of the computing system)and/or by any suitable component. Transmitting the light field image(s)can include storing the light field image(s), which can function toprovide the light field image(s) for retrieval at a later time. Thelight field images can be stored at a storage module of a computingsystem and/or by any suitable component. However, the light field image,views, and/or any suitable data can be transmitted.

The lightfield images can be stored as encoded lightfield images, aslenticularized lightfield images, mulitlenticularized images, as rawlightfield images, and/or in any format.

Displaying the light field image S400 functions to display the lightfield image to one or more viewers. S400 preferably displays the lightfield image decoded from the encoded light field image, but can displaythe encoded light field image, a light field image (e.g., generated inS3 oo), views or a light field image as received in S100, alenticularized image, and/or any suitable views and/or image. Thedisplay light field image is preferably perceived as three-dimensional(e.g., a holographic image), but can be perceived as two-dimensional,one dimensional, and/or otherwise be perceived. The light field image ispreferably viewable without using peripherals (e.g., headsets, glasses,etc.). For example, the displayed light field can be perceived as threedimensional without the use of peripherals. However, the light fieldimage can be viewable using peripherals. S400 preferably occurs after S3oo, but can occur before and/or at the same time as S3 oo. S400 ispreferably performed by a display, but can be performed by a computingsystem and/or any suitable system. The light field image is preferablydisplayed as a 3D render of the scene, but can be a 2D render of thescene and/or any suitable render of the scene.

S400 can include decoding the encoded light field image(s) (and/orvideo), which functions to convert the encoded light field image into alight field image that can be read by one or more displays. The encodedlight field image(s) are preferably decoded based on information in themetadata of one or more key view(s) of the encoded light field image(s),but can be decoded independent of the metadata and/or based on anysuitable information. The encoded light field image can be decoded inparts (e.g., decode one light field image at a time, decode a set oflight field images at a time, decode a buffer of light field images,decode a buffer of views, etc.) and/or in full (e.g., decode all viewsat once, decode all light field images at once, etc.). The views and/orlight field images can be decoded sequentially (e.g., one view at atime, one image at a time, etc.) and/or in parallel (e.g., decodemultiple views simultaneously, decode multiple light field imagessimultaneously, etc.). The light field images can be decoded at thecamera array (e.g., using a computing system collocated with the cameraarray, using a camera or camera array computer, etc.), at a remotecomputing system (e.g., at a cloud computing system), at a computingsystem collocated with a display (e.g., a display computing system),and/or otherwise be distributed.

In a first embodiment, decoding the encoded light field image(s) caninclude decompressing the encoded light field image(s), which canfunction to extract views from the encoded light field image(s). This ispreferably used when the light field image is encoded as a video, butcan be used for other encoding schemes. The views are preferablyextracted in order (e.g., in view order, based on the view indexing,based on the key frame metadata, etc.), but can be extracted in anysuitable order. In a specific example, the views (e.g., the set of viewscorresponding to a light field image) can be extracted from a video(e.g., a stored video, an encoded light field image, a video in themetadata of a thumbnail, etc.) and arranged in a light field image(e.g., a quilt image). The arrangement order can be: specified by themetadata, specified by the encoding scheme (e.g., the arrangementpattern), or otherwise determined.

In a second embodiment, decoding the encoded light field image(s) caninclude interpolating between views of the encoded light field image,which can function to recover and/or prepare views that are not storedin the encoded light field image. This is preferably used when the lightfield image is encoded as a depth array, but can be used for otherencoding schemes. New views are preferably generated by interpolatingbetween the stored views, but can be generated by reprojecting a 3Dmodel (generated based on the stored views) into a set of virtualcameras, generating the views using a trained neural network, and/orgenerated in any suitable manner. Interpolating between the stored viewscan include interpolating between the pixel locations for the adjacentviews, interpolating based on the depth map for the adjacent views,and/or any suitable steps. The adjacent views can be temporallyadjacent, spatially adjacent, adjacent in the array and/or otherwiserelated. However, new views can be simulated (e.g., an optical model ofthe scene) and/or determined in any suitable manner.

In a first variant of the second embodiment, generating intermediateviews can be performed using a neural network. For example, Depth-AwareVideo Frame Interpolation (DAIN), Channel Attention Is All You Need forVideo Frame Interpolation (CAIN), frame interpolation with multi-scaledeep loss functions and generative adversarial network (FIGAN), and/orany suitable machine learning algorithm can be used to generate viewsintermediate between two (or more) views. In this variant, theintermediate views can be generated from the views, from depthinformation associated with the views, and/or using any suitableinformation.

In a second variant of the second embodiment, generating intermediateviews can include projecting (or reprojecting) two (or more) views. Theintermediate views can be projected from views associated with a cameraposition to the left of the intermediate view, views associated with acamera position to the right of the intermediate view, views associatedwith a camera position above the intermediate view, views associatedwith a camera position below the intermediate view, and/or any suitableviews. Preferably, intermediate views are projected from at least twoviews (e.g., views on opposing sides of the view to reproject such as aview left of the intermediate view and a view right of the intermediateview), but can be projected from a single view, four views, eight views,and/or any suitable number of views.

In an illustrative example of the second variant, an intermediate viewcan be generated by determining a depth pixel-by pixel from a firstview; initializing the intermediate view with the depth for each pixelfrom the first view (e.g., initially, the intermediate view has the samedepth information as the first view); determining a pixel offset betweenthe first view and the intermediate view based on: the depth, adifference in perspective (e.g., a difference in angle, a targetdifference, etc.) between the first view and the intermediate view,and/or camera properties (e.g., camera perspective, distance from aconvergence plane, etc.); and writing the color value for a given pixelfrom the first view to the intermediate view at pixel determined usingthe pixel offset. When the depth of the intermediate view pixel (e.g.,from the initial intermediate view) is less than the depth of the givenpixel, the color value is not written to the intermediate view (e.g.,because the pixel would be occluded in the intermediate view). When thedepth of the intermediate view pixel is greater than the given pixel,the color and depth texture for the intermediate view is rewritten. Invariants, the process in this illustrative example can be repeated usinga second view to generate the intermediate view. The process can beselectively repeated (e.g., to fill gaps in the intermediate view)and/or repeated for the whole intermediate view. The resultingintermediate views (e.g., reprojected from the first view and the secondview) can be combined by: averaging the views, merging the views (e.g.,keeping pixel data for pixels that agree, keeping pixel data that is notout of bounds or not a number, etc.), a composite view, and/or otherwisecombining the views or using a single generated view.

In variants of this specific example, when the depth of the intermediateview (e.g., the initialized intermediate view) is within a threshold(e.g., within a depth of <0.0001, 0.0001, 0.0005, 0.001, 0.005, 0.01,0.05, 0.1, 0.5, 1, 5, 10, >10, values therebetween, etc.) of the depthfrom the reprojecting view; the color for the intermediate pixel can bedetermined by interpolating the color from the projected views. As shownfor example in FIG. 6, a pixel in intermediate view i can be determinedby interpolating between pixels for views j and j+a. However, the pixelscan otherwise be interpolated.

In some variants of this specific example, a flickering phenomenon canbe observed, for instance, when two or more instances of the process tryto provide color data to the same pixel concurrently. The flickeringphenomenon can be mitigated, for instance, by repeating (e.g. using aprevious iterations intermediate view as the initial intermediate viewfor the current iteration) the reprojection process two or more timeswhere each repeat can further decrease the flickering phenomenon.However, the flickering phenomenon (and/or other artifacts) canotherwise be mitigated.

In a third embodiment, decoding the encoded light field image(s) caninclude generating a set of views from a 3D representation, which canfunction to generate a set of virtual views 414 (e.g., virtual images)associated with a light field image of a scene. This is preferably usedwhen the light field image is encoded as a 3D representation, but can beused for other encoding schemes. The set of virtual views is preferablygenerated using a set of virtual cameras 150 (e.g., a set of virtualcameras that are simulated with the same calibration parameters as thecamera array, a set of virtual cameras with a different calibrationparameters from the camera array, a set of virtual cameras positionedbased on the viewer and/or user position, camera model, etc.), but canbe generated in any manner. The set of virtual cameras can be positionedat a location within or relative to the 3D representation based on thedisplay, based on a position of one or more viewers relative to thedisplay, at predetermined positions, and/or otherwise be positioned. Theset of virtual views preferably includes the same number of views as thelight field image, but can include more virtual views and/or fewervirtual views than the number of views in the light field image. Thevirtual views are preferably arranged (e.g., in the same order as theviews, in a different order from the views) to generate the light fieldimage to be displayed.

In a first variant of the third embodiment, the color for the virtualviews can be sampled from a texture of the 3D representation. In anillustrative example, the views can be generated using photogrammetrytechniques. However, the views can otherwise be generated.

In a second variant of the third embodiment, the color for the virtualviews can be determined based on a color extracted from the views usedto generate the 3D representation. Typically, in the second variant, atexture (e.g., color texture) is not applied to the 3D representation.However, a texture can be applied to the 3D representation. The colorinformation for the virtual views can be extrapolated, interpolated,inferred, and/or otherwise generated from the views used to generate the3D representation. However, an image (e.g., from the set of views usedto generate the 3D representation) can be associated with the 3Drepresentation and color information can be sampled from the image(e.g., interpolating between or extrapolating from pixels as needed)and/or the color information can otherwise be generated.

In a fourth embodiment, decoding the encoded light field image(s) caninclude combining the views from the encoded light field image (e.g.,calculating views from a key view 416 and a set of difference views418), which functions to recover the original light field image(s) fromthe encoded light field image(s). This is preferably used when the lightfield image is encoded using difference views, but can be used for otherencoding schemes. Combining the views preferably includes adding a lightfield image to a difference light field image, but can include anysuitable steps. Combining the views is preferably performed sequentially(e.g., starting at a key light field image, each subsequent differenceview added to a preceding view) to recover the set of light fieldimages, but can be performed in parallel (e.g., starting at each keylight field image, each difference view added to a key view) and/or inany suitable order.

In a fifth embodiment, decoding the encoded light field image(s) caninclude computing or extracting the light field images from adecomposition representation (e.g., a polynomial representation) of theimages.

S400 preferably includes generating a lenticularized image from alightfield image. A lenticularized lightfield image 430 (e.g.,holographic image) preferably refers to lightfield images that have beenaligned to a display, but can be otherwise defined. Aligning thelightfield images to a display preferably includes applying acalibration associated with the display to the lightfield image (e.g.,to align pixels of the lightfield image to display pixels), but can beotherwise performed. Generally, though not exclusively, the calibrationis unique to a display, therefore lenticularized images are typicallyassociated with a small number of displays (those displays withsubstantially the same calibration). However, lenticularized lightfieldimages can be associated with any number of displays.

The lenticularized image can be generated at a computing systemcollocated with the display (e.g., a display computing system, as shownfor example in FIG. 15A), at a remote computing system (e.g., cloudcomputing, as shown for example in FIG. 15B, where the cloud computingsystem can store the calibration(s) for one or more display and/orreceive the calibration associated with each of one or more display),and/or at any suitable computing system.

A multi-lenticularized preferably refers to a lenticularized lightfieldimage that has been duplicated along at least one axis, but can beotherwise defined. Multilenticularization can function to decrease achromatic anomaly present in the displayed image and/or otherwisefunction. Multilenticularization is particularly, but not exclusively,beneficial when the computing system (e.g., a GPU thereof) is able tohandle higher resolution images than need to be displayed, but is onlyable to work with images that use a lossy image format (such as a formatthat is not pixel perfect, that does not perfectly preserve color,etc.). The number of multiples for duplication can be associated withthe degree of lenticularization. For example, as shown in FIG. 14A, alenticularized image that has been duplicated once (e.g., such that eachpixel is represented two times in the resulting lenticularized image)can be referred to as a double-lenticularized image 435 (e.g., doublelenticular image). The multiple is preferably an integer, but can be arational or irrational value. Analogously, the entire lenticularizedlightfield image is preferably duplicated. However, any subset of thelenticularized lightfield image can be duplicated. In specific examples,particularly when the image compression retains approximately half ofthe pixel color information, double lenticularization (e.g.,lenticularization with a multiple of 2) can be sufficient to mitigatethe effects of artifacts arising from the image compression. However,any suitable multiple can be used.

The multilenticularized lightfield images is preferably generated usingnearest neighbor filtering (also referred to as point filtering), butcan be generated using any suitable algorithm. Multilenticularizedlightfield images are preferably generated by duplicating the pixelcolumn of the lightfield image a number of times equal to the multiple.For example, as shown in FIG. 14A, each pixel column of a doublelenticularized image can be duplicated twice. However,multilenticularized lightfield images can additionally or alternativelybe generated by duplicating each pixel row, by duplicating each columnand each row, by duplicating the pixels along an offset, duplicating asubset of pixels of the image, duplicating the entire lenticularizedimage (e.g., wherein the duplicate can be appended to the left, right,top, bottom, and/or offset from one of the directions relative to theoriginal lenticularized image), and/or be otherwise generated. Duplicatepixels are preferably adjacent or proximal the duplicated pixel (e.g.,where the duplicated pixel can be along one edge or along the center ofa contiguous pixel block of the duplicated pixel), but can be otherwisearranged.

In some variants of lenticularized images, the image can be distributedacross two or more rows (or columns), where each row (or column)includes a subset of the image. For instance, a first segment 436 of thelenticularized image can be appended to a second segment 437 of thelenticularized image, wherein the first segment and second segment canshare a predetermined number of identical pixels. Each portion (orsegment) preferably includes a predetermined number of overlappingpixels with other portions (or segments), which can function to improveand/or ensure that the portions (or segments) can be stitched togetherfor display. The predetermined number can depend on an image resolution,a computing system (e.g., a processing module, GPU, etc. thereof), be afixed value, depend on an image quality, and/or can otherwise beselected. The predetermined number is preferably at least 16 pixels(e.g., 16 distinct pixel columns such as 20, 25, 30, 40, 50, 60, 75,100, 150, etc.), but can be less than 16 pixels. In these variants, therows can be multi-lenticularized. In a specific example, as shown inFIG. 14B, a lenticularized image can be split between a top imageportion and a bottom image portion, where each portion is approximatelythe same size (e.g., includes the same number of pixels). In thisexample, each pixel of the lenticularized image can be duplicated.Between the top image portion and bottom image portion, a group ofstitching pixels (e.g., overlapping pixels) is duplicated on the rightedge of the top image portion and the left edge of the bottom imageportion. However, the lenticularized image can be divided into anynumber of portions and/or arranged in any manner.

S400 can optionally include modifying a resolution of the lightfieldimage. Modifying the resolution is particularly, but not exclusively,beneficial for multilenticularized images where the image resolution hasbeen increased (e.g., approximately by the multiple) and is greater thanthe resolution of the display. Modifying the resolution preferablychanges (e.g., increases, decreases, stretches, squishes, etc.) anaspect ratio of the lightfield image to match the resolution of thedisplay. The resolution can be automatically and/or manually modified.The resolution can be modified by the display, by a computing system(e.g., a display computing system, cloud computing system, etc.), and/orotherwise be modified. However, modifying the resolution of thelightfield image can otherwise function.

5. Specific Examples

In a first example of S3 oo, as shown in FIG. 4, generating the encodedlight field image can include optionally generating a quilt image fromthe set of views of a light field image, arranging the set of views intoan N×1 array of views, compressing the set of views as a video (e.g.,using a video codec), selecting a thumbnail (e.g., key view from thelight field image), and storing the video as metadata of the thumbnail.

In a second example of S300, as shown in FIG. 5, S300 can includedetermining depth maps for a subset of the views (e.g., for every a^(th)view where a is an integer) from a disparity vector determined betweenview a and view a+1, and storing the subset of views with theirassociated depth maps as a depth array.

In a third example of S300, as shown in FIG. 7, S300 can includedetermining depth maps for a subset of the views (e.g., for every a^(th)view) from a disparity vector determined between view a and view a+1,generating a 3D reconstruction of the scene (e.g., by determining thescene geometry based on the depth maps, the scene appearance based onthe views), and storing the 3D reconstruction.

In a fourth example of S300, as shown in FIG. 9, generating an encodedlight field video can include arranging views for each light field imageinto a 1-dimensional array by populating the array with views from thelight field images in a zigzag arrangement (e.g., for each even indexedview, arranging the views in ascending order from 1 to N and for eachodd indexed view, arranging the views in descending order from N to 1;for each even indexed view, arranging the views in descending order fromN to 1 and for each odd indexed view, arranging the views in ascendingorder from 1 to N; etc.), determining a set of key views (e.g., the setof views that includes the first view corresponding to each light fieldimage, the set of views that includes the first arranged viewcorresponding to each light field image, etc.) and/or key light fieldimages, appending the view order for the light field image to themetadata in the key view for the corresponding light field image, andstoring the arranged views in a video.

In a fifth example of S3 oo, as shown in FIG. 10, generating an encodedlight field video can include arranging the views for each light fieldimage; determining a set of key light field images (e.g., every second,third, fourth, fifth, tenth, twentieth, thirtieth, fortieth, fiftieth,hundredth, etc. light field image; random light field images; lightfield images selected according to a ruleset; etc.); determining a setof key views from the set of key light field images (e.g., the set ofviews that includes the first view corresponding to each light fieldimage, the set of views that includes the first arranged viewcorresponding to each light field image, etc.); appending metadata(e.g., a view order, whether the view is a key view or a differenceview, etc.) to each key view; iteratively, starting with each key lightfield image, calculating a difference between adjacent light fieldimages, arranging the resultant difference views (e.g., in an M×1array), appending the difference views to the encoded light field image;and storing the encoded light field image(s) as a video (e.g., using avideo codec). In a specific example, a light field video can be encodedas a 1-dimensional video including a timeseries of views. Each view fromeach light field image of the light field video can be included in thetimeseries of views. Alternatively, each view of the timeseries of viewscan be a key view selected from each light field video frame, and isassociated with the difference data calculated between the key view andthe remaining views of the respective frame. Alternatively, each view ofthe i-D video is a view from a key frame 408 of the light field video(e.g., arranged as discussed in the first embodiment of S230), whereineach view is associated with difference data calculated between the viewand one or more corresponding views (e.g., sharing the same index) fromthe remaining frames of the light field video (e.g., successive framesof the light field video).

In a sixth example as shown in FIG. 17, S300 can include determining apolynomial representation 429 for a lightfield image.

However, the embodiment, variants, and/or subvariants of S300 can becombined in any manner and/or S300 can be performed in any manner.

In a first specific example, decoding the encoded light field image caninclude: accessing metadata from a thumbnail, extracting the views froma video stored in the metadata, and generating a light field image fromthe extracted views. In a variant of this specific example, decoding theencoded light field image can include: (a) accessing metadata from a keyview, (b) extracting the views associated with a light field image(e.g., of a light field video), (c) generating a light field image fromthe extracted views based on the view order (e.g., as determined fromthe metadata such as ascending, descending, zigzag, etc.), and repeating(a)-(c) for each subsequent light field image of the light field video.

In a second specific example, as shown in FIG. 6, decoding the encodedlight field image can include: accessing a depth array, determiningcorrespondences between views from the depth array, generatingintermediate views between the views of the depth array by interpolatingbetween the views, and generating a light field image from the views ofthe depth array and the intermediate views.

In a third specific example, as shown in FIG. 8, decoding the encodedlight field image can include: accessing a 3D reconstruction, generatinga virtual camera array (e.g., corresponding to a point of view,corresponding to the camera array, etc.), acquiring a set of virtualviews (e.g., virtual images) of the 3D reconstruction using the virtualcamera array, and generating a light field image using the set ofvirtual views.

In a fourth specific example, as shown in FIG. 11, decoding the encodedlight field image can include: extracting views associated with a keylight field image from an encoded light field image (e.g., as identifiedby the metadata in a key view), arranging the extracted views in a lightfield image, (a) extracting difference views associated with asubsequent frame from the encoded light field image, (b) arranging thedifference views into a difference light field image, (c) adding thelight field image and the difference light field image to generate thesubsequent light field image, repeating (a)-(c) for each differencelight field image.

In a fifth specific example, as shown in FIG. 16, decoding the encodedlight field image can include providing the encoded light field image toa neural network (e.g., using DAIN), generating views from the encodedlight field image using the neural network, and optionally arranging theviews in a quilt image. In this specific example, the encoded lightfield image can include a subset of views (e.g., 1 view, 2 views, 3views, 4 views, 6 views, 8 views, 10 views, etc.) and optionally depthinformation associated with each view.

However, the different embodiments and/or variants of decoding theencoded light field image can be combined in any suitable manner and/orthe encoded light field image can be decoded in any suitable manner.

Embodiments of the system and/or method can include every combinationand permutation of the various system components and the various methodprocesses, wherein one or more instances of the method and/or processesdescribed herein can be performed asynchronously (e.g., sequentially),concurrently (e.g., in parallel), or in any other suitable order byand/or using one or more instances of the systems, elements, and/orentities described herein.

As a person skilled in the art will recognize from the previous detaileddescription and from the figures and claims, modifications and changescan be made to the preferred embodiments of the invention withoutdeparting from the scope of this invention defined in the followingclaims.

We claim:
 1. A method for displaying a holographic image comprising:receiving a set of images, each image associated with a differentperspective of a scene; generating a quilt image by arranging the set ofimages starting with a first image associated with a first perspectiveand ending at a second image associated with a second perspective;lenticularizing the quilt image based on a calibration associated with adisplay; duplicating each pixel of the lenticularized quilt image atleast once to generate the holographic image, wherein each duplicatedpixel is adjacent to the corresponding original pixel; and displayingthe holographic image using the display.